Ensure that the following packages are installed:
knitr.
The purpose of this project is to gain experience in genome-wide analyses of differentially expressed genes using permutation distributions and the False Discovery Rate (FDR).
This project follows the first example provided in Chapter 16, “Two-Color Case Studies,” of the Limma User Guide: Linear Models for Microarray and RNA-Seq Data User’s Guide, specifically section 16.1: “Swirl Zebrafish: A Single-Group Experiment.” The example is augmented by generating some additional plots.
The Limma User Guide describes this data as follows:
“Background. The experiment was carried out using zebrafish as a model organism to study the early development in vertebrates. Swirl is a point mutant in the BMP2 gene that affects the dorsal/ventral body axis. The main goal of the Swirl experiment is to identify genes with altered expression in the Swirl mutant compared to wild-type zebrafish.
The hybridizations. Two sets of dye-swap experiments were performed making a total of four replicate hybridizations. Each of the arrays compares RNA from swirl fish with RNA from normal (“wild type”) fish.”
We consulted the Limma User Guide (pages 76 and 77) to identify the
“targets” file containing metadata about the samples hybridized to the
four arrays in this experiment. The name of the file was added to the
following code chunk, in the call to readTargets. This
parametrized the subsequent call to read.maimages, which
used filenames for image data from each microarray, returning a Limma
RGlist object.
setwd("~/Desktop/swirl")
library(limma)
targets <- readTargets("SwirlSample.txt")
RG <- read.maimages(targets, source="spot")## Read swirl.1.spot
## Read swirl.2.spot
## Read swirl.3.spot
## Read swirl.4.spot
## An object of class "RGList"
## $R
## swirl.1 swirl.2 swirl.3 swirl.4
## [1,] 19538.470 16138.720 2895.1600 14054.5400
## [2,] 23619.820 17247.670 2976.6230 20112.2600
## [3,] 21579.950 17317.150 2735.6190 12945.8500
## [4,] 8905.143 6794.381 318.9524 524.0476
## [5,] 8676.095 6043.542 780.6667 304.6190
## 8443 more rows ...
##
## $G
## swirl.1 swirl.2 swirl.3 swirl.4
## [1,] 22028.260 19278.770 2727.5600 19930.6500
## [2,] 25613.200 21438.960 2787.0330 25426.5800
## [3,] 22652.390 20386.470 2419.8810 16225.9500
## [4,] 8929.286 6677.619 383.2381 786.9048
## [5,] 8746.476 6576.292 901.0000 468.0476
## 8443 more rows ...
##
## $Rb
## swirl.1 swirl.2 swirl.3 swirl.4
## [1,] 174 136 82 48
## [2,] 174 133 82 48
## [3,] 174 133 76 48
## [4,] 163 105 61 48
## [5,] 140 105 61 49
## 8443 more rows ...
##
## $Gb
## swirl.1 swirl.2 swirl.3 swirl.4
## [1,] 182 175 86 97
## [2,] 171 183 86 85
## [3,] 153 183 86 85
## [4,] 153 142 71 87
## [5,] 153 142 71 87
## 8443 more rows ...
##
## $targets
## SlideNumber FileName Cy3 Cy5 Date
## swirl.1 81 swirl.1.spot swirl wild type 2001/9/20
## swirl.2 82 swirl.2.spot wild type swirl 2001/9/20
## swirl.3 93 swirl.3.spot swirl wild type 2001/11/8
## swirl.4 94 swirl.4.spot wild type swirl 2001/11/8
##
## $source
## [1] "spot"
Similarly, on page 78, we identified the “GAL” file containing
metadata about the genes spotted on the arrays. The file name was
included in the following code chunk, in the call to
readGAL.
First, we created image plots for the background intensities of the
first array, swirl.1.
Next, we visualized how this background variation affects M-values computed for the first array. Note that no normalization method is applied when computing the M and A values here.
MA_raw <- normalizeWithinArrays(RG, method="none")
imageplot(MA_raw$M[,1], RG$printer, zlim=c(-3,3))Below is an interpretation of the result, as described on page 80 of the Limma User Guide:
The
imageplot()function lies the slide on its side, so the first print-tip group is bottom left in this plot. We can see a red streak across the middle two grids of the 3rd row caused by a scratch or dust on the array. Spots which are affected by this artifact will have suspect M-values. The streak also shows up as darker regions in the background plots. The red streak seen on the image plot can be seen as a line of spots in the upper right of this plot.
We created an MA-plot for the first array and MA-plots grouped by print-tip for the unnormalized data:
We also generated boxplots for all arrays to visualize the raw M-values without normalization:
library(ggokabeito)
boxplot(MA_raw$M[,1]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.1")boxplot(MA_raw$M[,2]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.2")boxplot(MA_raw$M[,3]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.3")boxplot(MA_raw$M[,4]~MA_raw$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Raw M vs Print-Tip for swirl.4")We normalized within arrays using the default “printtiploess” method and visualized the normalized M-values.
MA_print_tip <- normalizeWithinArrays(RG,method="printtiploess")
# ^^^ method="printtiploess" is the default
plotMD(MA_print_tip)
abline(0,0,col="blue")boxplot(MA_print_tip$M[,1]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.1")boxplot(MA_print_tip$M[,2]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.2")boxplot(MA_print_tip$M[,3]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.3")boxplot(MA_print_tip$M[,4]~MA_print_tip$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess) vs Print-Tip for swirl.4")Following the Limma User Guide’s recommendation, we applied scale normalization across arrays and visualized the results.
MA_print_tip_scale <- normalizeBetweenArrays(MA_print_tip,method="scale")
plotMD(MA_print_tip_scale)
abline(0,0,col="blue")boxplot(MA_print_tip_scale$M[,1]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess + scale) vs Print-Tip for swirl.1")boxplot(MA_print_tip_scale$M[,2]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normlized M (printtiploess + scale) vs Print-Tip for swirl.2")boxplot(MA_print_tip_scale$M[,3]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normalized M (printtiploess + scale)vs Print-Tip for swirl.3")boxplot(MA_print_tip_scale$M[,4]~MA_print_tip_scale$genes$Block,xlab="Print-Tip",ylab="M",col=rep(palette_okabe_ito(1:8),each=2),main="Normalized M (printtiploess + scale) vs Print-Tip for swirl.4")## Found unique target names:
## swirl wild type
## swirl
## swirl.1 -1
## swirl.2 1
## swirl.3 -1
## swirl.4 1
ebayes_print_tip_scale <- eBayes(fit_print_tip_scale) ## the current function is eBayes
qqt(ebayes_print_tip_scale$t,
df=ebayes_print_tip_scale$df.prior+ebayes_print_tip_scale$df.residual,pch=16,cex=0.2)plotMD(ebayes_print_tip_scale)
abline(0,0,col="blue")
top30 <- order(ebayes_print_tip_scale$lods,decreasing=TRUE)[1:30]
text(ebayes_print_tip_scale$Amean[top30],ebayes_print_tip_scale$coef[top30],labels=ebayes_print_tip_scale$genes[top30,"Name"],cex=0.8,col="blue")ordinary.t <- fit_print_tip_scale$coef / fit_print_tip_scale$stdev.unscaled / fit_print_tip_scale$sigma
head(ordinary.t)## swirl
## 1 -2.4751770
## 2 -2.1079491
## 3 -1.8683988
## 4 -1.3537954
## 5 -2.3612873
## 6 0.8435477
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
tibble(moderated_t=unname(ebayes_print_tip_scale$t),ordinary_t=unname(ordinary.t)) |> mutate(shrink=(abs(moderated_t) < abs(ordinary_t))) |> ggplot(aes(x=ordinary_t,y=moderated_t,color=shrink)) + geom_point(alpha=0.5) + labs(title="Effect of Limma Moderation on Gene-Wise t") + scale_color_okabe_ito()## Block Row Column ID Name logFC AveExpr t P.Value adj.P.Val
## 3721 8 2 1 control BMP2 -2.21 12.1 -21.1 1.03e-07 0.000357
## 1609 4 2 1 control BMP2 -2.30 13.1 -20.3 1.34e-07 0.000357
## 3723 8 2 3 control Dlx3 -2.18 13.3 -20.0 1.48e-07 0.000357
## 1611 4 2 3 control Dlx3 -2.18 13.5 -19.6 1.69e-07 0.000357
## 8295 16 16 15 fb94h06 20-L12 1.27 12.0 14.1 1.74e-06 0.002067
## 7036 14 8 4 fb40h07 7-D14 1.35 13.8 13.5 2.29e-06 0.002067
## 515 1 22 11 fc22a09 27-E17 1.27 13.2 13.4 2.44e-06 0.002067
## 5075 10 14 11 fb85f09 18-G18 1.28 14.4 13.4 2.46e-06 0.002067
## 7307 14 19 11 fc10h09 24-H18 1.20 13.4 13.2 2.67e-06 0.002067
## 319 1 14 7 fb85a01 18-E1 -1.29 12.5 -13.1 2.91e-06 0.002067
## 2961 6 14 9 fb85d05 18-F10 -2.69 10.3 -13.0 3.04e-06 0.002067
## 4032 8 14 24 fb87d12 18-N24 1.27 14.2 12.8 3.28e-06 0.002067
## 6903 14 2 15 control Vox -1.26 13.4 -12.8 3.35e-06 0.002067
## 4546 9 14 10 fb85e07 18-G13 1.23 14.2 12.8 3.42e-06 0.002067
## 683 2 7 11 fb37b09 6-E18 1.31 13.3 12.4 4.10e-06 0.002182
## 1697 4 5 17 fb26b10 3-I20 1.09 13.3 12.4 4.30e-06 0.002182
## 7491 15 5 3 fb24g06 3-D11 1.33 13.6 12.3 4.39e-06 0.002182
## 4188 8 21 12 fc18d12 26-F24 -1.25 12.1 -12.2 4.71e-06 0.002209
## 4380 9 7 12 fb37e11 6-G21 1.23 14.0 12.0 5.19e-06 0.002216
## 3726 8 2 6 control fli-1 -1.32 10.3 -11.9 5.40e-06 0.002216
## 2679 6 2 15 control Vox -1.25 13.4 -11.9 5.72e-06 0.002216
## 5931 12 6 3 fb32f06 5-C12 -1.10 13.0 -11.7 6.24e-06 0.002216
## 7602 15 9 18 fb50g12 9-L23 1.16 14.0 11.7 6.25e-06 0.002216
## 2151 5 2 15 control vent -1.40 12.7 -11.7 6.30e-06 0.002216
## 3790 8 4 22 fb23d08 2-N16 1.16 12.5 11.6 6.57e-06 0.002221
## 7542 15 7 6 fb36g12 6-D23 1.12 13.5 11.0 9.23e-06 0.003000
## 4263 9 2 15 control vent -1.41 12.7 -10.8 1.06e-05 0.003326
## 6375 13 2 15 control vent -1.37 12.5 -10.5 1.33e-05 0.004026
## 1146 3 4 18 fb22a12 2-I23 1.05 13.7 10.2 1.57e-05 0.004242
## 157 1 7 13 fb38a01 6-I1 -1.82 10.8 -10.2 1.58e-05 0.004242
## B
## 3721 7.96
## 1609 7.78
## 3723 7.71
## 1611 7.62
## 8295 5.78
## 7036 5.54
## 515 5.48
## 5075 5.48
## 7307 5.40
## 319 5.32
## 2961 5.29
## 4032 5.22
## 6903 5.20
## 4546 5.18
## 683 5.02
## 1697 4.97
## 7491 4.96
## 4188 4.89
## 4380 4.80
## 3726 4.76
## 2679 4.71
## 5931 4.63
## 7602 4.63
## 2151 4.62
## 3790 4.58
## 7542 4.27
## 4263 4.13
## 6375 3.91
## 1146 3.76
## 157 3.75
## Block Row Column ID Name logFC AveExpr t P.Value adj.P.Val
## 3721 8 2 1 control BMP2 -2.21 12.1 -21.1 1.03e-07 0.000357
## 1609 4 2 1 control BMP2 -2.30 13.1 -20.3 1.34e-07 0.000357
## 3723 8 2 3 control Dlx3 -2.18 13.3 -20.0 1.48e-07 0.000357
## 1611 4 2 3 control Dlx3 -2.18 13.5 -19.6 1.69e-07 0.000357
## 8295 16 16 15 fb94h06 20-L12 1.27 12.0 14.1 1.74e-06 0.002067
## 7036 14 8 4 fb40h07 7-D14 1.35 13.8 13.5 2.29e-06 0.002067
## B
## 3721 7.96
## 1609 7.78
## 3723 7.71
## 1611 7.62
## 8295 5.78
## 7036 5.54
## [1] 277 11
## swirl
## Down 156
## NotSig 8171
## Up 121
We compared the number of differentially expressed genes (DEGs) identified by Limma to those obtained using FDR estimation methods, including pi0 estimation.
ebayes.fit.fdr <- p.fdr(pvalues=ebayes_print_tip_scale$p.value,threshold = 0.1)
plot(ebayes.fit.fdr, main="Benjamini-Hochberg FDRs for Swirl")plot(ebayes.fit.fdr,xlim = c(0,1500),ylim=c(0,0.2),main="Blowup for Benjamini-Hochberg FDRs for Swirl")## [1] 277
## [1] 277
## [1] 277
## [1] 277
ebayes.fit.fdr.set.pi0 <- p.fdr(pvalues=ebayes_print_tip_scale$p.value,
set.pi0 = get.pi0(ebayes_print_tip_scale$p.value),threshold = 0.1)
plot(ebayes.fit.fdr.set.pi0,main="Benjamini-Hochberg FDRs for Swirl with pi0 estimation")plot(ebayes.fit.fdr.set.pi0,xlim = c(0,1500),ylim=c(0,0.2),main="Blowup for Benjamini-Hochberg FDRs for Swirl with pi0 estimation")## [1] 516
## [1] 516
## [1] 277
## [1] 277